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Abstract 

This paper investigates theoretical properties and efficient numerical algorithms for the 
so-called elastic-net regularization originating from statistics, which enforces simultaneously 
I 1 and £ 2 regularization. The stability of the minimizer and its consistency are studied, and 
convergence rates for both a priori and a posteriori parameter choice rules are established. 
Two iterative numerical algorithms of active set type are proposed, and their convergence 
properties are discussed. Numerical results are presented to illustrate the features of the 
functional and algorithms. 



1 Introduction 

In recent years, minimization problems involving so-called sparsity constraints have gained con- 
siderable interest. Sparsity has been found as a powerful tool and recognized as an important 
structure in many disciplines, e.g. geophysical problems [251 H2], imaging science [IS], statistics 
[ST] and signal processing [HIS]- The setting is often as following: Let Hi and Hi be two Hilbert 
spaces and let Hi be equipped with an orthonormal basis {ipi G Hi : i G N} (or an overcomplete 
dictionary). Then, for given linear and continuous operator K : Hi — * H2, data y s G Hi and 
regularization parameter a > 0, we seek the minimizer of the functional 

*(x) = ±\\Kx-y s f + aJ2\{x,<Pi}\- 

i 

Here y s is an observational version of the exact data and satisfies an estimate of the form 
\\y 5 — < S. With the help of the basis expansion, the problem can be reformulated as 

min^(x) with V(x) = ^ \\Kx - y s \\ 2 + a\\x\\ t i , (1) 

by abusing the notation x for the sequence of expansion coefficients {xi :— {x, ifi)} and K for the 
operator {x{\ > Kj^i Xi^i mapping from £ 2 to 7^2- 

Because of its central importance in inverse problems and signal processing, the efficient min- 
imization of the functional has received much attention, and a wide variety of numerical algo- 
rithms, e.g. iterated thresholding/shrinkage 013]) gradient projection [T3J[5S], fixed point contin- 
uation [TB], semismooth Newton method (SSN) [TS] and feature sign search (FSS) [2Tj . have been 
proposed. Both SSN and FSS are of active set type, and have delivered favorable performance 
compared to the above-mentioned first-order methods. However, they often require inverting po- 
tentially ill-conditioned operators, and thus lead to numerical problems. One possible remedy is to 
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regularize the inversion, e.g. by Tikhonov regularization. On the other hand, recent studies [23j[T4] 
show the regularizing property of the functional "J and under suitable source conditions also the 
convergence rate of its minimizer x s a to the true solution x' of the form 

\\x s a -x^\\ e =0{8). 

However, the involved constant may be astronomically large. In other words, the ill-posed prob- 
lem has been turned into a well-posed but ill-conditioned one, and this is in accordance with 
inverting ill-conditioned operators. In this paper we propose to address both issues by Tikhonov 
regularization, i.e. considering a functional of the form 

*aA x ) = \\\ Kx -v s \\ 2 + a\\AW + §N&. 

We will show that this functional leads to more stable active-set algorithms and provides improved 
error estimates. We note that it also arises by Moreau-Yosida regularization of the Fenchel dual 
of the functional \£ . 

The functional <fr a ,/3 is also used in statistics under the name elastic-net regularization |30j . 
It is motivated by the following observation: The functional W delivers undesirable results for 
problems where there are highly correlated features and we need to identify all relevant ones, 
e.g. microarray data analysis, in that it tends to select only one feature out of the relevant group 
instead of all relevant features of the group [30] . i.e. it fails to identify the group structure. Zou 
and Hastie |30j proposed introducing an extra £ 2 regularization term, i.e. the functional ^ a ,/3, in 
the hope of retrieving correctly the whole relevant group, and numerically confirmed the desired 
property of the functional for both simulation studies and real-data applications. For further 
statistical motivations we refer to reference [3D]. Quite recently, De Mol et al. [TDJ showed some 
interesting theoretical properties of the functional $ a ,p, but their focus is fundamentally different 
from ours: Their main concern is on its statistical properties in the framework of learning theory 
and an algorithm of iterated shrinkage type, whereas ours is within the framework of classical 
regularization theory and algorithms of active set type. 

The rest of the paper is organized as follows. In Section [2] we investigate theoretical proper- 
ties, e.g. stability and consistency of the minimizers of the elastic-net functional. In particular, 
the convergence rates for both a priori and a posteriori regularization parameter choice rules are 
established under suitable source conditions. In Section [3] we propose two active set algorithms, 
i.e. the RSSN and RFSS, for efficiently minimizing the functional $ a .p, and discuss their conver- 
gence properties. In Section [4] numerical results are presented to illustrate the salient features of 
the algorithms. 



2 Properties of elastic-net regularization 

In this section we investigate the stability and regularizing properties of elastic-net regularization. 
Both a priori and a posteriori choice rules for choosing the regularization parameters are con- 
sidered. We shall denote the minimizer of the functional & a ,{3 by x s a g below, and occasionally 
suppress the superscript 5 for notational simplicity. Observe that for every (3 > 0, the functional 
& a ,i3 is strictly convex, and thus admits a unique minimizer. 

2.1 Stability of the minimizers x s af3 
Theorem 2.1. For the minimizer x s a g with a,f3 > there holds 

lim x^ a g = xt, g- 

Proof. The minimizing property of x n = x 5 an 3n implies that the sequences {||ifx n — y 5 \\}, {H^™!^ 1 }, 
and {||£ ra ||^2} are uniformly bounded. In particular there exists a subsequence of {x n } n , also de- 
noted by {x n } n converging weakly to some x* € i 2 . 
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By the weak continuity of K and weak lower-semicontinuity of norms, we have 
\\Kx*-y s \\ <liminf ||xl £ i < liminf ||x n || £ i and ||x*|| £ 2 < liminf ||x"|| f 2. (2) 

n — >-oo n— >oo n— >oo 

Consequently, we have 

= 2 W Kx * ~^l| 2 + "11^11^ + §11^11^ 

< i liminf \\Kx n - y|| 2 + liminf a n ||x"|| £ i + liminf —||x n ||| 

2 n— »oo n — »oo n — ►do 2 

< liminf H \\Kx n -y\\ 2 + a n \\x n \\ t , + ^\\x n \\l 
= liminf<j> Q a (x™). 

Next we show that < &a,/3(x* (3 ) > limsup^^ ^a n ,0 n (x n ). To this end, we observe 
limsup$ anA (x") < limsup$ Q , nA (x* / 3) 

n — *oo n— ^oc 

= Hm 4>a„,/3„(xi^) = <S>a,f3( X l 

n—HX> 

by the minimizing property of x™. Consequently 

limsup$ Qiii/9re (x n ) < $ 0)/ g(x* i/3 ) < $a,/3(x*) < liminf $ Q „,/3„ (x n ). 

n — >oo n *°° 

Therefore, x* is a minimizer of & a ,0, and the uniqueness of its minimizer implies x* = x s a p. 
Since every subsequence has a weakly convergent subsequence to x s a p, the whole sequence {x"}„ 
converges weakly to x s a p. Next we show that the functional value ||x"||^2 — » ||x* /3 ||£2, for which 
it suffices to show that 

limsup ||x"|| £ 2 < ||x* i/3 ||^2. 

n — >oo 

Assume that this does not hold. Then there exists a constant c such that c := limsup ?woo ||x n ||^ 2 > 



x 



5 112 



and a subsequence of {x n } n , denoted by {x™} n again, such that 



x™ — * x s a p weakly and ||x™|| 2 2 
By the continuity of <5 a)/ g(x* p) in (a,/?), we have 

lim (:Wx n -yY + a«lk"IUii = *«,/j(a£ a) - lim 

n — >oo I Z J — ^cx 



oo 2 

-||ifx^ - yT + aWxij? + £ (\\x s a J% - c) 

' <5 „.<5||2 , „ii <5 



< ^ K <p~ y II + a IK/»ll< i - 

This is in contradiction with the lower-semicontinuity result in equation Therefore we have 

limsup || x" 1 1 p < 11x^,^11^2. 

n — >oo 

This together with equation |2| implies that ||x n ||^2 — * ||x* ^H^, from which and the weak con- 
vergence the desired convergence in £ 2 follows directly. □ 

The preceding theorem addresses only the case that both a and (3 are positive. The case 
of vanishing a and positive (3 is obviously the same as the uniqueness of the minimizer to the 
functional $o,/3 remains valid. The more interesting case of vanishing f3 will be discussed below. In 
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general, due to the potential lack of uniqueness for vanishing fj, only subscqucntial convergence can 
be expected. Interestingly, whole-sequence convergence remains true under certain circumstances. 
To illustrate the point, we denote by S a the set of minimizers to the functional <fr a ,o- Clearly 
the set iS Q is nonempty and convex as a consequence of the convexity of the functional $ a ,o- 
Moreover, denote the minimum ■ \\ f i + ||| • \\% element of the set S a by x s . Since the 
functional 7|| • ||^i + ||| • ||| 2 is strictly convex, x s is unique. 

Proposition 2.2. Let the sequence {(a n , /3„)} n satisfy that for some 7 > and a > there holds 

lim f3 n = and lim = 7. 

Then we have 

lim x g = x a 

Proof. Denote X™ = x s a g the unique minimizer of &a n ,p n - By repeating the arguments of 
Theorem 2.1 we derive that there exists a subsequence of {x n } n , also denoted by {x n } n , that 
converges weakly in £ 2 to some a;*, and moreover, x* is a minimizer of <J> Qi o(x), i.e. x* G <S Q . 
The minimizing property of x n and x^ 7 implies 

\\\Kx n -y s \\l + a n \\x n y + ^\\x n \\% < \\\Kx 5 a ^ - y% + a n \\x s a J el + ^-\\x s a J% 

and 

\\\Kx s an - y s \\l + a\\x s a J P < \\\Kx n - y s f 2 + a\\x»\\ fi . 



Adding these two inequalities gives 

K - <*)\\x»y + ^\\x n }\% < (a n - a)\\x 5 a J el + ^\\x s a J%. 
Dividing by (3 n and taking the limit for n — > +00 yields 



by observing the assumption lim^oo a " g ~ a = 7- By the definition of the 7|| ■ \\p + ||| • Hp- 
minimizing element i7 and its uniqueness, we conclude that x* — x* . Since every subsequence 
of {x n } n has a subsequence converging weakly to X s , the whole sequence {x n } n converges weakly. 

Appealing to the arguments in Theorem 2.1 again, there holds ||x n ||£i — > ||x* which 
together with the weak convergence of the sequence implies 

x n -> x in l x . 

The lemma follows from the inequality ||x||^2 < ||x||^i. □ 
The next corollary is a direct consequence of the proofs of the preceding results. 

Corollar 2.3. The functions $> a ^(x s a g ) , \\x^ pWt 1 a nd H^^H^ are continuous in (a, (3). 

The next result shows the differentiability of the value function F(a,(3) := ^ a ,p{x s a g ). Dif- 
ferentiability plays an important role in efficient numerical realization of some rules for choosing 
regularization parameters [HE [5U] . 

Theorem 2.4. The value function F(a, p) is differ entiable with respect to a and (3, and moreover 

dF 11 a 11 a dF 1 11 s 112 
-q^ = \\x a ^\W and — = 
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Proof. For distinct a and 5, the minimizing property of x s a g and x^ 3 indicates 

T 1 II TS S S\\2 , II <5 II , Pit g ||2 1 || „ 5 <5 1 1 2 n 5 || P» <5 ||2 ^ n 

7 = 2 I' ~~ y I' + a ll x a:/3ll^ + ^ll^^ll^ 2 ~ 2" Kx& 'P ~ v I' ~ a \\ x &M^ ~ 2\\ X &,0h 2 < ' 

^ = - /II 2 + a|l4,/9ll^ + §IK/jI& - - /II 2 - a|l</jll* - f IK^II* < 0. 

Therefore, for a > 5, we have 

F(a,/?)-F(5,/3) = $a,/s« /9 ) - *a,/J«/s) 

= /+ (a- 5)111^^1 < (a- 5)|| ll^i, 

and 

F(a,/3)-F(5,/3) = $^« ) - 

= -7J+ (a - 5)11^^11^1 > (a- 5)11x^^11^1. 

These two inequalities together give 

4 F(a,P)-F(a,0) s 

a — a M 

Reversing the role of a and 5 yields a similar inequality for a < 5, which together with the 
continuity result in Corollary |2.3| implies the first identity. The second identity can be shown 
analogously. The differentiability of F(a,(3) follows from the continuity of the functions \\x s a B ^\gi 
and J|| 2 m ( a :P)i see Corollary 



2.3 



□ 



2.2 Consistency and convergence rates 

In this section we shall investigate the convergence behavior of the minimizers x s a 3 as the noise 
level S tends to zero for both a priori and a posteriori parameter choice rules. To this end, we 
need the following definition of (^-minimizing solutions. 

Definition 2.5. An element x^ is said to be a (^-minimizing solution to the inverse problem 
Kx = y s if it verifies Kx* = y' and 

<p[x ) < <p( x )i Va; with Kx = y* . 

To simplify the notation, we introduce the functional lZ n defined by 

TZ v {x) = r]\\x\\ e i + ^\\x\\ 2 e2 . 
We shall need the next result on the functional lZ, r 

Lemma 2.6. Assume that {x n }„ converges weakly to x* in I 2 and lZ n (x n ) converges to lZri(x*). 
Then 1Z v (x n — x*) converges to zero. 
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Proof. The assumption lZ v (x n ) — > lZ v (x*) and Fatou's lemma imply that 
limsupft^a;" - x*) = limsup[2(^„(x") + 1l v (x*)) - 2(TZ v (x n ) + TZ n (x*)) + TZ v (x n - a;*)] 

n n 

= 4fcJx*) -liminf V 

% 

1, 



2(r 1 \x?\+ 1 2 \x?\ 2 + V \x*\+ 1 -\x*\ 



-(V\x?-x*\ + »\x?-x*\ 2 ) 



< ATlJx*) -V liminf 

1 



^K\ + \\^\ 2 + ri\ X *\ + \\x* 



-(vK-x*\ + -K-x^) 



By the weak convergence of x n to a;*, we have x™ — > x* for all ieN. Therefore, 

Elim inf 
n 

i 

= ±Y.^\<\ + \\<\ 2 ) = w v (x*). 



2fa|*?| + + r,\x*\ + 2^1) - (r?|< -x*\ + \\x? - x*| 2 ) 



□ 



(3) 



(4) 



Combining the preceding inequalities we see 

Urn sup ^(z™ - x*) < AK v (x*) - AK v (x*) = 0, 

n 

i.e. lim ri _ 00 TZ n (x n - x*) = 0. 

Theorem 2.7. Assume that the regularization parameters a(5) and {3(5) satisfy 

and moreover that there exists some constant r\ > 

lim — — — = n. 
6^o (3(5) ' 

Then the sequence of minimizers {x^ g }s converges to the ry|| • ||^i + ||| • \\J 2 -minimizing solution. 

Proof. Let x' be the unique n\\ ■ ||^i + ||| • \\% -minimizing solution. The minimizing property of 
x s a g indicates 

\\\Kx% y s \\" + a\\x s a>p y + ^\\x s a J% < \\\Kx* y s \\ 2 + a\\x%, + ^\\% 

l r 9 II in $ ii in? 

< -5 + a\\x ] \\ e i + -\\x ] \\ t 2. 

By the assumptions on a(5) and (3(5), the sequences {H-fTa;* g — y s \\} and {\\x^ a\\e 2 )}s are uniformly 
bounded. Therefore, there exists a subsequence of {x 5 aj3 }$, also denoted by {x s a g }$, and some 
x* e I 2 , such that x s a g — > weakly. 

By the weak lower semi-continuity and the triangle inequality we derive 

\\Kx'-yt\\ 2 < 21iminf(||^4 !/3 -y 5 || 2 + ||/-yt||2 ) 

< 21iminf { ( 5 2 + 2a( ( 5)||a; t || £ i +/3( < 5)||a; t ||2 2 +<5 2 } = 0. 
s — >0 



G 



Thereby we have \\Kx* — y^\\ 2 = 0, i.e. Kx* = yL Similarly, 



T}\\X 



< lim inf 

5^0 



< lim inf 



a ( S h\^ II 
5 2 a(5) 



t S II 2 



8^0 [2/3(5) (3(6) 

2 



X t ||« 



A II?- 



r?!!^ ||p + i flit in 



(5) 



Since is the unique ', 



£i + 5 1| • | \ 2 2 -minimizing solution we deduce x* = x'. The whole sequence 
converges weakly by appealing to the standard subsequence arguments. From inequality ([5]), we 
have 



1 



1 



5^0 



-tnf.. 



By Lemma 2.6 and the weak convergence of the sequence {x a g }, this identity implies that 

lim \\x s aJi - x^\\% < lim 2TZ v (x s a ^ - x f ) = 0. 



5^0 



.5^0 



□ 



In Theorem 2.7 the first set of conditions on a(5) and (3(5), see equation Q3J> , is rather standard, 
whereas the other one in Q seems restrictive. The following question arise naturally: Can we 
further relax this condition? It turns out that it depends crucially on the structure of the set 
S = {x : Kx — y^}. Obviously, if the set S consists of only a singleton, i.e. K is injective, then the 
^1 ' ll^ 1 + |ll ' 1 1 % -minimizing solution is independent of r\ and thus the condition can be dropped. 
In general, this condition cannot be relaxed, as the following simple example shows. 



Example 2.8. Consider the two-dimensional example with 



K 



1 -2 

2 -4 



and y* 



The set S consists of elements of the form 

x(t) = 



+ t 



t G 



and the /^-minimizing solution x* minimizes 



1 



1 



VIM* + g II* = ^d 1 + 2< l + l f D + 2^ + 2t > + *> 



After some algebraic manipulations, the solution x* is founded to be 



x" = 



5-5^ 



if„>§ 
if?7<| 



Interestingly, there exists a critical value of rf: for 77 > 77* , the solution does not change, whereas 
for 77 < rj* , the solution keeps on changing. In particular, the condition lim^o |^fy = V is sharp 



in the latter case. 



Denote the r]\\ ■ ||^i + 5 II • Hp -minimizing solution by x n . Since the arguments for x a p in Section 
|2.1| remain valid in the presence of constraints, we have the following result. 
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Lemma 2.9. For r\ > 0, we have 



lim x„ 



where Xoo is taken to be the minimum-i 2 norm element of the set S of l l -minimizing solutions to 
the inverse problem. Moreover, the following identity holds 



X n fl. 



We shall need the following monotonicity result on the value functions || ||^a and Ha^H^. 

Lemma 2.10. The function Ha^H^i is monotonically decreasing, while \\x v \\ 2 2 is monotonically 
increasing with respect to the parameter r\ in the sense that for distinct r\\ and r\2 

(IKillf 1 - IKall^Xm - m) < and {\\x m \\% - ||x, 2 ||| 2 )(?7i - m) > o. 

Proof. Let 771,772 > be distinct. By the minimizing property of x m and x m , we have 

mWxviWf 1 + \\\ x vi\\p < ViW^mW/ 1 + \\\ x ri2 
V2\\x V2 \\ e i + ^\\xr, 2 \\% < mW^mWe 1 + \\\ x m\\%- 
Adding these two inequalities gives 

- II^Tja ||^)0?i - J?2) < 0, 

i.e. the function Ha^H^i is monotonically decreasing with respect to 77. The monotonicity of ||a; r; ||^2 
follows analogously. □ 

We shall also need the next result on the local Lipschitz continuity of x v in 77. To this end, 
we denote by dip the subdifferential of a convex functional ip, i.e. dp = {£ : p(y) — p{x) > 
(£, y — x) \fy G dom(ip)}. Since ||x||^ 2 is continuous, we may apply the sum-rule and get dlZ^x) = 
rjd\\x\\ii +x. Note that the subdifferential is set- valued, and can be expressed in terms of the 

function Sign defined componentwise by Sign(x)fe = sign(ijjfc) for nonzero Xk and Sign(x)fc = [—1, 1] 
otherwise, with the usual sign function. 

Lemma 2.11. The mapping r\ t— > x^ is locally Lipschitz continuous in r\ for 77 > 0. 
Proof. Let ^ be a subgradient of Hx^H^i. The minimizing property of x^ indicates 

(77^ + x n , x v — x) < 0, Vx e S. 
In particular, for distinct 772,7/2 > 0, this yields 

(77l^?7i ~t~ X T}\ , X Tji ^^2) — ^) 

(772^)2 + ^172 ! X V2 X Vl ) — 0) 



by noting that both x r)1 , x V2 € S. Adding these two inequalities together gives 

(£?7i £,rj2i x Vl ^112) ^ ( x Vl Xjj 2 ,X r i 1 x ri 2 ) — ( — ) ( x r] 2 jXi] 1 a '»)2 ) • 

Recall that the subgradient operator of a convex functional is maximal monotone [25 , i.e. 

(£r;i C»72 ! x Vi ■'''7a) — ^- 



(G) 



8 



Applying this inequality and the Cauchy-Schwartz inequality in inequality (|6j) yields 

n it ^ \\i 2 | | 

which by reversing the role of r\\ and 772 gives 

I, II ^ ■ JIKJ-* 2 11^2 I I 

If»?i - ^2 We- < mm I — — — , — — — j lr?x - r?a I • 

This concludes the proof of the Lemma. □ 



By Lemma 2.10 the function Ha^H^ 2 is monotonically increasing with respect to r\ and bounded, 
and thus the limits lim^oo ||xr;|U 2 and lim^o Ha^H^ 2 exist, which will be denoted by ||xoo|U 2 and 
||xo||£2, respectively. 

Theorem 2.12. Assume that ||aJoo||.£ 2 > ||xo||j2. Then there exists a set C C (0,+oo) of positive 
measure such that for each r\dzC, the mapping r\ — > H^H^ 2 strictly increasing. 



Proof. As noted above, the function Ha^H^s is monotonically increasing and bounded, and thus it is 
of bounded variation and almost everywhere differentiable. By differentiation theory of functions 



of bounded variation fT] , the derivative D r 



£2 can be decomposed as 



d\\x v \ 



dij 



+ Hs + He, 



drj 



where 



By Lemmas 2.9 



/is and jic denote the Lebesgue regular, singular and Cantor parts, respectively. 

CnWt 2 is continuous and locally Lipschitz, and thus both 



and 2.11 the function 



the singular and Cantor parts vanish. Consequently, the following integral identity holds 



drj 



drj = WxvoWp - \\x \\p. 



By the monotonicity of Lemma 



2.10 



the integrand 



dr/ 



is nonnegative. Therefore, there exists 
a set C C (0,+oo) of positive measure, such that the integrand is positive, i.e. Ha^H^ 2 is strictly 
increasing. □ 



Theorem 2.12 indicates for ?/ G C the function 77 



|a;^|| is strictly increasing. Therefore, 
the condition lim^o a{8)/(3{5) — rj in Theorem 2.7 for some rj at least cannot be relaxed to: 
lim inf ,*,-_() a(S)//3(S) = r/_ and lirn sup (> -_ o(S)/ ft(S) = »/_ for some q_, r/ + > such that (?/_, i]+)n 
C 0. This partially necessitates the condition lim^o a(S)/ '(3(5) = 77 for some rj in Theorem 2.7 



Remark 2.13. Many of our preceding results remain valid for far more general regularization 
terms, e.g. general convex functionals. 



We are now in a position to discuss the convergence rates of a priori and a posteriori param- 
eter choice rules. The foregoing discussions indicate that the condition lim^o — V is often 
necessary for ensuring the convergence as 8 tends to zero. Therefore, we shall assume that the 
ratio of a and p is fixed, i.e. there exists an rj such that /3 = r/a, for the choice rules. The next 
theorem shows that elastic-net regularization behaves similar to classical Tikhonov regularization 
[11) in that an analogous error estimate holds under a slightly changed source condition. 

Theorem 2.14. Let Kx^ — and assume \\y s — y^\\ < i5. Moreover, let there be some rj > 
such that x' fulfills the source condition 

3w : K*w € (id +77 Sign) (a 4 ). (7) 
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Then it holds that the minimizer x s a g of & a ,f3 with a — r]{3 fulfills 

\\Kx s at8 -y s \\<5 + 20\\w\\ 

and 

Proof. By the minimizing property of x s a s there holds 

\WKxlp - y s f + a\\xl a y + §\\x s a J% < \\\Kx^ - y s f + a\\^y + §\\x% 2 , 
which leads to 

- /|| 2 + aiWxijp - \\x*y) + l(\\xij% \\x*\\%) < |||^t - /|| 2 . 
Using the identity ||a:* i( g||f 2 - \\x^\\% = \\x 5 a B - x*\\% + 2{x\ x s a J3 - it) we get for any £ e Sign(it) 

%\\Kx s a>l3 - /|| 2 + a(\\x s a>a y - Wx^We - {i,x 5 atB - it)) + a(t,x s aiB - it) 

>0 

+ § (iK/j - ^112. + 2(xt,x^ - it)) < i ||^t _ /||2. 

We conclude 

\\\Kx^ & /|| 2 + + x\x* a f - ^) + f ||i^ - it|| 2 2 < Hl^it - /|| 2 . 

Since £ <G Sign(it) is arbitrary, we may choose it in such a way that the source condition (jT]), i.e. 
7]£ + x' = K*w, holds. Consequently 

\\\K^ - /|| 2 + 0{w,Kx% - y s ) + § lla^ - it|| 2 2 < |||irit - /|| 2 + (3(w,Kx1 - /). 

Completing the squares on both sides by adding /3 2 ||w|| 2 /2 leads to 

|||Xi^ - / + [3w\\ 2 + f IK/3 - z+ll?. < §11*^ - V 5 + M\ 2 
which proves the theorem. □ 

The source condition ^ in Theorem |2.14 is equivalent to: There exists a w 6 such that 
K*w E &R, rt (x'). It can be interpreted as the existence of a Lagrange multiplier to the Lagrangian 
of a constrained optimization problem [5]. Theorem 2.14 in particular, implies that for the choice 
f3 = 0(5), the reconstruction x 5 a B achieves a convergence rate of order 0(S 1 ^ 2 ). 

The ultimate goal of elastic-net regularization is to retrieve a sparse signal. Under the premise 
that the underlying signal it j s truly sparse, the convergence rate can be significantly improved by 
using a technique recently developed by Grasmair et al. 14J. To this end, we need the so-called 
finite basis injectivity property of the operator K. 

Definition 2.15 ([4 ). An operator K : I 2 — > H2 has the finite basis injectivity property, if for 
all finite subsets / C N the operator K\j is injective, i.e. for all u, v € £ 2 with Ku — Kv and 
Ui = Vi — for all i (£ I it follows u = v. 

The next lemma will play a role in establishing an improved convergence rate. 

Lemma 2.16. Assume that the solution it is sparse and satisfies the source condition Q), and 
that the operator K satisfies the finite basis injectivity property. Then there exist two positive 
constants C\ and C2 such that 

K v (x) - 1l v (xi) > ci||a: - x*\\ P - c 2 \\K(x - it)||. 
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Proof. Let £ £ Sign(x^) such that a7\ is satisfied. Denote by I the index set {i £ N : > |}. 
Since £ £ £ 2 , the set I is finite, anctobviously, it contains the support of xK Let ttj and iTj~ be 
the natural projections onto I and N\I, respectively. Then irix^ = x^ and 7Tj x* — 0. By the finite 
basis injectivity property of the operator K, we have for some constant C 

C\\Kttix\\ > ||7rix||f2. 

Consequently, 

Wx-x^Wp < W-k^x-x^Wp + W-kIxWp 

< C\\K(x-x^)\\ + (l + C\\K\\)\\^x\\ e2 . 

The source condition ^ implies that 

— (x^ + rj£, x — x') = —(K*w,x — x^ 
= — (w, K(x — x')) 

< \\w\\\\K(x - x*)\\. (8) 
Now let m = max^i < g. By the inequality ||x|||2 < ||x||,gi, we derive that 

hix\\e < ^M = 2^(l-m)|x 4 | <2^(|x t |-&x0 

i0 i0 i0 

= 2j2(\xi\-\4\-ti(xi-x\)) 

i0 

< 2 (\\ x y - \\x%x (£,x xt) + ^[\\ x \\% - ||xt||| - 2{x\x- X*)]) 

= 277" 1 ([K v (x) - 11^)] - (77C + x\x- x 1 )) 

< 2rf x [Unix) -Tl v {x^) + \\w\\ \\K(x-x^) ||] , 

where we have used the identity |jx|j| 2 — — 2(x^,x — x^) = \\x — x^\\^ 2 > 0, inequality ^ 

and the fact that x^ vanishes outside the index set I. 
Combining above estimates gives 

\\x - < (C + 2^(1 + C||iq)|M|)||tf (z - it) I) + 2t 7 - 1 (1 + Cm-Unix) - TL^)]. 

This concludes the proof of the lemma. □ 

Assisted with Lemma |2.16| we are now ready to state an improved error estimate. 

Theorem 2.17. Under the conditions in Lemma \2.16\ there holds with the constants stated there 

S 2 cf/3 c 2 5 



< 



2c\(3 2ci ci 



Proof. Since x s a a minimizes & a ,/3, the inequality 



\\\Kx s aJj -y 5 f + PK^xtp) < h\Kx* - y s \\ 2 + pTl v (x^ 



holds. Utilizing the fact \\Kx^ — y s \\ < S, the triangle inequality and Lemma 2.16 we have 
\S 2 > mv{<p)~K v {^))+ l -\\Kxl p -y 5 \\ 2 

> (3 Cl \\x s aJj - x%, - pc 2 \\K(x 5 a ^ - xt)|| + \\\Kxl p - y s \\ 2 

> pcWx^p - x%, - pc 2 \\Kx s atf} - y 5 \\ - (3c 2 6 + \\\Kx% - y s \\ 2 . 
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Applying the inequality ab < \a? + \b 2 with a = C2/3 and b = \\Kx s a g — y s \\ concludes the proof 
of the theorem. □ 

Remark 2.18. We see that for the choice (3 ~ 0(5), there exists some constant c such that 

\\ x i,0 ~ ^ll^ 2 < cS 

Hence, the preceding two theorems indicate that the elastic-net regularization can preserve simul- 
taneously the convergence rate of classical Tikhonov regularization and that of ^-regularization. 
The rate of the latter is better, but the constant c can be huge. Elastic-net regularization remedies 
this by retaining the classical rate with a probably more modest constant. All together, we obtain 
by combining Theorems 2.17 and 2.14 that with [3 = 5 there holds 

iK/s-^IU* <min(c<5,(l + |H|)\/J). (9) 
2.3 A posteriori parameter choice 

We now turn to an a posteriori parameter choice rule, i.e. the discrepancy principle in the sense of 
Morozov, for determining the regularization parameter. Note that a priori choice rules usually give 
only an order of magnitude instead of a precise value, which undoubtedly impedes their practical 
applications. In contrast, the discrepancy principle enables constructing a concrete scheme for 
determining an appropriate regularization parameter. However, there have been relatively few 
investigations of a posteriori choice rules for regularization involving general convex functional 
[U [2U] . The subsequent developments are motivated by those in reference [3D] . Mathematically, 
the principle amounts to solving a nonlinear equation in /3 

\\Kx s at3 -y 5 \\ = rS (10) 

for some r > 1 . Without loss of generality, we shall fix r = 1 in the sequel. 
We shall need the next lemma. 

Lemma 2.19. The minimizer to the functional & a> g vanishes if and only if a > su-Phai 2 ,h^o — \\k\\ i — 

Proof. Assume that is a minimizer of the functional & a .f3- The minimizing property of implies 
that for any h £ £ 2 



L Vll 2 ^ll^-y 5 ll 2 + alHI^ + ?Nlp 



2 

Collecting the terms gives 

(K*y S ,h)<^\\Kh\\ 2 + a\\h\\ el + ^\\h\\%. 

By dividing by \\h\\i and setting h — eh' and letting e tend to zero we deduce that 

(K*y s ,h) 
a > sup — — . 

he£ 2 ,h^0 ll^lk 1 

Conversely, assume that the above inequality holds. Then for any h S £ 2 , there holds 
(K*y s ,h) < a\\h\\ t i < \\\Khf +a\\h\\ i , + ^\\h\\%. 



By completing square it gives 

l\\y 5 f <l\\Kh-y s \\ 2 +a\\hy + ^M%, 
By the definition of the minimizer, we conclude that is the minimizer of the functional & a ,f3- D 
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The next result shows the existence and uniqueness of the solution to equation (10 1. 



Theorem 2.20. Assume that the conditions lim^o ll-^^ifl — V S \\ < ^ an ^ \ > ^ hold. Then 
there exists at least one solution (3* to equation (10). Moreover, if the solution (3* satisfies f3*rj < 

sup^g^ h ^ ^ K \\h\\ ^ > then it is also unique. 

Proof. Let fa and fii be distinct and for i = 1,2 denote oij = 77/?,;. By the minimizing property of 
x s n o and x s n R we have 

Oil, pi 012-P2 

' 1 " „$ „,<5||2 1 a tj /„S \ ^ II „,<5||2 1 a.T> 



2 1 1 - y 6 lr + Wv « ,ffi ) < 2 1 1 K < A~y\\+ PiRv « a ) . 

^4 2 , /32 -y 5 || 2 + /3 2 ^« A ) < i||Xxi l!ft -^|| 2 + /3 2 ^ t? (xi iA ). 



From these two inequalities we derive 

(||Jr< lft - 2/II 2 - - y s \\ 2 Kfa - fa) > 

j/* 5 1 1 is monotonic in /?. By Corollary 



i.e. \\Kx s atfl 



2.3 



it is continuous. Therefore under the 



conditions lim^o II-^^q p ~ y S \\ < $ an d \\y 5 \\ > 5, we have 

lim \\Kx s a p - y s \\ < S and hm \\Kx s a<p - y 5 \\ = \\y 5 \\ > 5. 



S — >oo 



The existence of at least one positive solution to equation ( 10 1 now follows from the continuity. 
The optimality condition for x s a a reads 

-K*(Kx s at0 - y s ) - (3x% e pfldW^U 
Multiplying both sides of the inclusion by a;* R gives 

(Kx s a ^Kx 5 a>0 - y 5 ) + [3\\x s a> pf e + = 0, 

Under the assumption (3r\ < sup hee 2 h ^ , x s a n is nonzero, and thus for distinct j3\ and fa, 



V,/3i 



and xt, R are distinct. 

OI2 ,P2 



We show the uniqueness by means of contradiction. Assume that there exist two distinct 
solutions fa and fa to equation (10 1. By the minimizing property and distinctness of x s ai /3i and 



, we have 



\\\Kx 5 auh -y 5 \\ 2 + faK v (x s ai>f3l ) < \\\Kxi 2>02 - y 5 f + faK v (x 5 a2 ^) 

which together with \\Kx s au/3i - y s \\ = \\Kx 5 a2 [h - y s \\ implies that TZr,(x s ai /3i ) < TZ 71 (x s a2/32 ). 
Reversing the role of fa and fa gives TL^x^ p 2 ) < ^(1^ 01 ), which is a contradiction. □ 

The next result shows the consistency of the discrepancy principle for elastic-net regularization. 
We remind that the regularization parameter (3 determined by the discrepancy principle depends 
on both S and y s , although the dependence is suppressed for notational simplicity. 



Theorem 2.21. Let (3 be determined by equation (10), and x^ be the -minimizing solution of 
the inverse problem. Then we have 



lim x 

5^0 



a, 13 



13 



Proof. By the minimizing property of the solution 4,/3> we have 



This together with equation (10 1 and the fact that \\Kx' — y 5 \\ < S indicates that 

n v {x s atf3 ) < n^), (ii) 

i.e. the sequence {TZ v (x^ p)}s is uniformly bounded. Therefore the sequence {4 p} is uniformly 
bounded, and there exists a subsequence of {x s a p}s, also denoted as {x & a p}, and some x* , such 
that 4 p converges weakly to x* . 
By the triangle inequality, we have 

11*4,8-^11 < \\Kxlp - y & \\ + \\y & - yt|| <6 + S = 26. 

Therefore, weak lower scmicontinuity of the norm gives 

0< \\Kx* -y^W < liminf \\Kxt n - y s \\ < lim 26 = 0, 
- ii y ii - S ^ Q ii a ,p y ii - s ^ 



i.e. H-ftTx* — y^|| = or Kx* = y^ . From inequality (111, we have 

1l v (x*) < liminf 7^(4 s) < T^jO 1 )- 

Therefore, a;* is a 7?., ; -minimizing solution, and by noting the uniqueness of the TZ 71 minimizer x\ 
we deduce that x* = x* . Since every subsequence of {x 5 a p} has a subsequence weakly converging 
to x\ the whole sequence weakly converges to xK 
Furthermore, we have 

7^-7,(2^) < liminf Tl v (x 5 a p) < lim sup K^fi^) < lZ v (x^) 



i.e. 



hmn v (xlp) = n v (x^. 



8^0 

This together with the weak convergence and Lemma |2.6| implies the desired strong convergence. 

□ 

The next result shows that the discrepancy principle achieves similar convergence rates as the 
a priori parameter choice rule under identical conditions. 

Theorem 2.22. Assume that the exact solution x^ satisfies the source condition and the 
regularization parameter (3 is determined according to equation (10). Then there holds 

114^-^11,, < 2 |M|M. 

Moreover, if the conditions of Lemma \2.16\ hold, then there holds with the constants given there 

114,/3-^IU^^- 
^1 

Proof. Since x^ satisfies the source condition ^ there exists £ € 9||a^^||^i such that K*w — x^ 
Inequality (111 implies that 

-^(* f ) + -* f ) < -{rt + x\x 5 a ^p-x^) 

= -(K*w,x s a: p-x^) 

= -(w,Kxi t p-J) 

< \\w\\\\K± s aifl -yi\\<2\\w\\5. 
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However, noting £ £ d\\x^\\ei, we have by the defining inequality of a subgradient 



rtll?. 



Combining above two inequalities gives the first estimate. 



By inequality (11 1 and Lemma 2.16 we have 



cilla^-a^ll* < TZ v (x 5 a!0 )-n v (x^ + c 2 \\K(xi >0 -x^\\ 
< c 2 \\K(x s aJj -x^\\<2c 2 S 



This concludes the proof of the theorem. 



□ 



3 Active set algorithms 

Having established the analytical properties of the elastic-net functional and its minimizers, we 
now proceed to the algorithmic part of minimizing the functional. We will derive adaptations of the 
SSN jT5] and FSS [2T] and show that these algorithms are regularized versions of the respective l 1 - 
algorithms. In addition, we show convergence results for both methods. For notational simplicity, 
we shall drop the superscript 5 in this section. 



3.1 Regularized SSN (RSSN) 

We now derive an algorithm for the clastic- net functional Q a ,/3 based on the SSN [HI [28], which 
in turn coincides with a regularization of the SSN [15] and hence will be called RSSN. 
Using sub-differential calculus the optimality condition for <& a ,8 reads 

£ d$ a ,p(x) = dV a (x)+/3x. 

With the help of the set-valued Sign function, it reads 

- K* (Kx - y) -0x £ a Sign(x). (12) 

The similarity of the optimality conditions for classical t 1 - and elastic-net minimization suggests 
adapting existing ^-algorithms. It can be formulated equivalently using the soft-shrinkage function 
S a , which is defined componentwise by S a {x)i — max{0, |x<| — a} ■ sign(xj). 



Lemma 3.1. An element x solves equation (12) if and only if 

F(x) := 0x- S a (-K*(Kx-y)) = 0. (13) 



Proof. Obviously, the inclusion ( 12 1 is equivalent to 

K*(Kx-y) a„. , , 

P P 

Noting the identity S a = (id+aSign) -1 (see, e.g. [15 ) it follows that 

x = S a/f} (-K*(Kx-y)/(3). 
Now the identity S ca (cx) = cS a (x) for c > concludes the proof. □ 
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The RSSN consists of solving equation ( 13 ) by Newton's method. F(x) is not diffcrcntiablc 
in the classical sense because of the nonsmooth shrinkage operator S a , and thus a generalized 
notion of differentiability is required for applying Newton's method. We shall use the notion of 
Newton derivative [5J[TH]. The shrinkage operator S a turns out to be Newton differentiable. More 
precisely, we have the next result. 



Lemma 3.2 (|15j. A Newton derivative of S a is given by 

'id 



G{x) 



{ien-.\xi\>a} 




and for any bounded linear operator T : I 2 — > i 2 and any b € I 2 a Newton derivative of S a (Tx + b) 
is given by G(Tx + b)T. 

Hence, a Newton derivative of F is given by D(x) — (3 id —G(—K* (Kx — y))K*K. Given a set 
A C N, we split the operator K*K as 



K*K 



M A M AA < 

M A . A M A a 



Upon letting A x := {i e N : \K*(Kx — y)\i > a}, we have 

Lemma 3.3. For every x G I 2 , D(x) is invertible and \\D(x)~ l \\ is uniformly bounded. 
Proof. Splitting the equation D(x)f = g blockwise gives 

PfU* = <?U £ and (/3id Ax +M Ax )f\ Ax = g\ Ax 



M 



A X A: 



Therefore, the invertibility of D(x) only depends on the invertibility of (3id A:c +M Ax . 

Denote by P Ax the canonical projection P Am : I 2 — » I 2 which projects onto the components 
listed in A x . Then the matrix M Atc = P Ax K* KP Aa; , and thus it is self-adjoint and positive 
scmidefinite. Therefore, the eigenvalues of j3 id Ax +M Ax are contained in the interval [(3, oo). 
Consequently, the matrix (3id Ajs +M Aic is invertible, and IK/SidA^ +M Ax )~ 1 \\ < /3 _1 . Now the 
assertion follows from 



f3 id Ax +M At M a „a % 
/3id A c 



(/?id^ +MAJ- 1 -f3- 1 {(3id AT +M A J~ 1 M Ax A i 
/3 _1 id^ 

< /r 1 (\\9A.\\+p- 1 \\MA m AogA*\\ + \\g A g\\) 

< r 1 (2+/3- 1 n^ii)-Ni, 



e- 



where we have used the inequality ||M 



A^As. 



< \\K*K\\ 



□ 



This lemma verifies the computability of Newton iterations to find solutions of (13): 

x k+1 = x k - D(x k y 1 F(x k ) 

(pid Axk +M Axk )-\(K*y)\ Axk ±a) 


In particular, this shows that the next iterate depends on the previous one only via the active set. 
We are now ready to state the complete algorithm. 



Step 1 Initialize: k = 0, x° 







1G 



Step 2 Choose active set A x k = {i e N : \K*(Kx k — y)\i > a] and calculate 

fl, [-K*{Kx k -y)]i>a 
*i = 1-1, [-K*{Kx k -y)}i<-a. 
I 0, else 



Step 3 Update for the next iterate x k+1 

z fc+ V fc - {(3id A „ +M A k )-\K*y - s k a)\ > 



x k+1 \ A o =0 



(14) 



Step 4 Check stopping criteria. Return x k+1 as a solution or set fc «— + 1 and repeat from step 2. 

A natural stopping criterion for the algorithm is the change of the active set. If it does not 
change for two subsequent iterations, then a minimizer has been attained. The next result is well 
known and included for completeness. 

Theorem 3.4. Let K : £ 2 — > H.2 and a,(3>0. The RSSN converges locally superlinearly . 
Proof. Let x* be the minimizer of <fr a ,p. Using the above lemmas and F(x*) = we have 



x 



k+l 



x*\\ £ 2 = \\x k -D(x k )~ 1 F(x k )-x*\\ e 2 



= \\x k - D{x k )- 1 F{x k ) ~ x* + D{x k )~ l F{x*)\\p 
= UDCa;*)- 1 1| ||Z>(a;*) (a?* - x*) - F(x k ) + F(x*)\\. 
The definition of Newton derivative implies 

Um \\F(x) - F(x*) - D(x)(x - x*)\\ = 

x^x' ||x — X* \\(2 

and hence for arbitrary e > and \\x k — x*\\g2 sufficiently small we have 

\\D(x k )- 1 \\\\D(x k )(x k - x*) - F(x k ) + F{x*)\\ < || J D(^ fc )- 1 || • e\\x k - x*\\ e 2 
which shows the desired supcrlincar local convergence. □ 

Remark 3.5. Several comments on the algorithm are in order. Firstly, this algorithm differs 
from the classical SSN [TS] only in the regularization of the equation in step 3. Secondly, the 
proposed RSSN method is different from the standard regularized Newton method (also known as 
the Levenberg-Marquardt method) via 

x k+i = x k _ (^fe) + v id)- 1 F(x k ), 

for some i] > 0, in that the latter regularizes globally whereas the former regularizes only on the 
active set. Thirdly, there are several equivalent reformulations of the minimization problem. For 
instance, multiplying ( 12 1 by 7 > and adding x gives 

x — jK*(Kx — y) — 7/3a; £ x + 7a Sign(x), (15) 

and also an alternative characterization of a minimizer of ^ a ,f3- x — Sj a (x — jK*(Kx — y) — ^/(3x) = 
0. It leads to a similar algorithm but with a different active set, i.e. 

A\ = {i e N : \x - ^K*(Kx - y) - 7/fa|j > 7a} 



Another choice of the active set derives by rewriting ( |T5| ) as x — jK*(Kx — y) 6 (1 + jf3)x + 
7«Sign(x). This gives (1 + r yf3)x — S ia (x — ^K*{Kx — y)) — 0, and also a third choice of the 
active set 

A\ = {i e N : \x - jK*(Kx -y)\i> 7a}. 
These different choices may affect the convergence behavior of the respective algorithms. 
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3.2 Regularized FSS (RFSS) 

The main drawback of the RSSN is its potential lack of global convergence. Globalization may 
be achieved by adopting alternative selection strategics for the active set. The RFSS is one such 
example. It derives from the FSS jH] as the RSSN from the SSN. In this section we will describe 
the RFSS algorithm in detail and show the next convergence result. For simplicity, we consider 
only finite-dimensional problems: K : W — > M. m , y S M. m and s = {1, 2, . . . , s}. 

Theorem 3.6. The RFSS converges globally in finitely many steps, moreover every iteration 
strictly decreases the value of the functional Q a ,p- 

We shall need the notion of consistency, which plays a fundamental role in the RFSS. 

Definition 3.7. Let A C s, x = (xi) iB s € M s and 9 = (6i) iE s € {-1,0, 1} S . The triple (A,x,6) is 
called consistent if 

i £ A =*> sign(xj) = ^ 0, 
i e A c => xs = Bi = 0. 



With a consistent triple (A,x,0) we can split the optimality condition (12) into 

(-K*{Kx -y)-/3x)i = a9i, i S A, (16) 
\K*{Kx-y)\i<a, itA c . (17) 



Remark 3.8. The formulas ( 16 1 and ( 17 ) correspond to the optimality condition for the following 
auxiliary functional 

Q a ,{,,o(x) = ±\\Kx-y\\ 2 + a(x,0) + £\\x\\%. (18) 

By the definition of consistency, & a ,p(x) = ^a,fj,e(x) if sign(x); = 6i for all nonzero components 
of x. In any case we have 

Now we are ready to describe the complete RFSS algorithm in five steps. The description will 
also provide a constructive proof of Theorem |3.6| 



Step 1 Initialize: k = 1, Aq = 0, x° = and 9° = 0. Any consistent triple (Aq, x°, 9°) is valid for 
initialization. Then check the optimality condition (12 1 and take one of the actions 

(i) return the solution if fulfilled; 

(ii) continue with Step 2 if (17 1 is not fulfilled; 

(iii) continue with Step 3 otherwise. 



Step 2 At this step, the following premises hold: The optimality condition ( 17 1 is not fulfilled and 
the triple (Af._i,x , ) is consistent. This step performs a greedy scheme by selecting 
the index i$ violating condition (17 1 the most, i.e. 

e argmax \K*(Kx k ~ 1 — y)\i — a. 
Then update the active set by Af. = A^-i U {i^}, update 9 k by 

1 l-signdK^Kx^-y))^), i = i% 



and continue with Step 3. 
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Step 3 Calculate the next iterate x k such that ( fl6| is fulfilled, i.e. x k is optimal for <& a ^fi k , by 

x k \ Ak = (Pid+M Ak )- 1 {K*y-ae k )\ Ah and x k \ A% = 0. 

Observe that the update coincides with that in the RSSN. If the triple (Ak, x k ,9 k ) is consis- 
tent, continue with Step 5, and otherwise continue with Step 4. For the former, we deduce 
from Remark 13.81 that 



Step 4 This step handles inconsistent (A k ,x k ,9 k ). We consider two different cases separately. 

Case 1 The preceding step of Step 3 is Step 4, i.e. (Ak, x k ~ 1 7 9 k ) is consistent. Therefore, there 
must be at least one index such that the signs of x k and x k ~ 1 differ. Let Ao the smallest 
A G (0, 1) such that 

3z eA k : (Xx k + (1 - X)x k - 1 ) l0 = 0, 
and denote x\ = Xox k + (1 — Ao)x Now the convexity of <& a p,e k implies 

$a„a(xA ) = $ ai p te k(x\ ) 

< A $ Q)/3)9fe (x fe ) + (1 - X )<P a , p , ek (x k - 1 ) 

< A $ Q)/3) ^ (a;*- 1 ) + (1 - A )$ a , Afl * i^' 1 ) 

by the minimizing property of x k for & a g gk. Now we update (Ak,x k ,9 k ) by 
x k <- x Xa , A k <- {i G s : x 2 fe =^0}, 6> fe sign(a; fe ) 



and check equation (16). If fulfilled continue with step 5, otherwise increase k by one 
and continue with step 3. 

Case 2 The preceding step of Step 3 is Step 2, i.e. \K*(Kx k ~ 1 - y)U > a and x* fc -1 = 0. The 

» 

choice of 6* fc fc implies 

signCV*^,^*- 1 )),* = sign^^x*- 1 - + a0|) - -0**, 
and moreover, 

v^^Cx*- 1 )!^ =0. 

Now the Taylor expansion of at X 1 yields that for ai near to x fc_1 with 

> ^A^^-^Ae*^" 1 ) = (V$ Q , / 3,efc(a; fe - 1 )) i g(x-x fe - 1 ) i g = (VSa.^*^ -1 ))^ (£)(*, 
by observing (a; fe_1 ) i fc = 0, which consequently implies 

> -0 k k x.k =>• fe fc = signi.fc. 

The minimizing property in Step 3, implies that ^ a ,/3,e k ( x ) < ^a,/3,9 h C 3 - )j which 
further implies together with the convexity of $ a ^ that there exists a x near to x 1 
on the line segment from x k to x fe such that $ a pgh(x) < ^ a pgk(x k ^ 1 ). Conse- 
quently, 

sign(x* fc ) = 9%. 

Thus there can be a sign change for x to x k only on a component other than £§• 
Now we continue analogously to Case 1. 
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Step 5 At this step, the following premises are fulfill: (Ak,x k ,8 k ) is consistent and the optimality 
condition (16 1 is fulfilled. Check (17 1. If fulfilled, stop, otherwise continue with Step 2. 



From the strictly reducing property of the algorithm we know that every possible active set is 
attained at most once. This guarantees the convergence in finitely many steps. We observe that 
the reduction properties hold also for infinite-dimensional problems. 

Remark 3.9. Both algorithms, RSSN as well as RFSS, also work for weighted I 1 norms, i.e. 
replacing by 5Zaj|a;j| in the definition of & a ,f3 where on > c > for some constant c. All 

theoretical results remain valid for this case. 

Remark 3.10. The symmetric matrix in Step 3 of the RFSS changes only by one row and one 
column at almost every iteration. Therefore, it is advisable to use the Cholesky factorization to 
solve the equation because of its straightforward update and reduction in computational efforts. 

4 Numerical experiments 

In this section we compare classical ^-minimization algorithms and their elastic-net counterparts 
for both well- and ill-conditioned operator equations. For qualitative properties of elastic-net reg- 
ularization compared to classical € 1 -regularization, we refer to [30] . and for in-depth comparisons 
of existing I 1 algorithms, we refer to [THIHI]- We only aim at illustrating algorithmic differences 
between SSN and RSSN or FSS and RFSS. All the algorithms were implemented in MATLAB 
R2008a and run on an AMD Athlon 64 X2 Dual Core Processor 3800+ equipped with a 64 bit 
linux. 

4.1 Test 1: Well conditioned operators and absence of noise 

As for our first test, we use a setting that if is a 400 x 400 Gaussian random matrix with its 
columns normalized to unit norm. This gives rise to a well-conditioned operator and hence should 
pose no problem to classical SSN and FSS. The exact solution is the zero vector with every 
10th entry set to 1. The simulated exact data is then generated by = Kx'. 

To study the influence of the parameter /?, we fix the value of the parameter a at a ~ 10~ 5 . The 
numerical results for one typical realization of the random matrix are summarized in Table [l] In 
the table, x* denotes the minimizer computed by the algorithm at hand, e x * := \\x' — x* \\p /Wx^W^ 
denotes the relative error, refers to the size of the active set A x * , indicating the sparsity 

of the solution x* , and the computing time is measured in milliseconds (ms). Note that j3 = 
corresponds to the classical i 1 algorithms. 

The parameter (3 affects significantly the sparsity of the minimizer, especially in case of larger 
values, e.g. — 2 -12 . This value renders the dominance of the £ 2 term over the t 1 term in 
the functional, and thus completely destroys the desired sparsity. Meanwhile, it also deteriorates 
greatly the reconstruction accuracy and computational efficiency. The latter is due to the fact 
that more iterations are needed to accurately resolve all the entries in the active set. This is the 
case for both RFSS and RSSN. For small values of (3, the computing time changes only slightly. 

4.2 Test 2: Rank-deficient operators and absence of noise 

The next test demonstrates the stability of elastic-net in the more challenging case of ill-conditioned 
or rank-deficient operators. We use the same setting as for Test 1, but set columns 201 to 400 of 
the random matrix K the same as columns 1 to 200. This gives rise to a rank-deficient matrix. 
The numerical results for one exemplary random matrix are shown in Table [2j 

As expected, both SSN and FSS fail ruthlessly as a consequence of inverting rank-deficient 
submatrices. In sharp contrast to these classical t 1 algorithms, their elastic-net counterparts 
remain robust so long as the (3 value is not exceedingly small. These algorithms converge and give 
results with accuracy comparable to the well-conditioned case, see Tables [l] and [2j 
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Tabic 1: Numerical results for Test 1: a well conditioned problem with exact data. 



RFSS RSSN 



/3 






^iterations 


time(ms) 


^iterations 


timc(ms) 





70 


6.61e-7 


80 


83 


7 


82 


2 -30 


70 


6.64e-7 


80 


85 


7 


83 


2 -28 


70 


6.75e-7 


80 


91 


8 


87 


2 -24 


95 


9.44e-7 


105 


107 


7 


84 


2 -20 


181 


9.68e-6 


193 


308 


8 


98 


2 -16 


186 


1.68e-4 


200 


330 


7 


94 


2-12 


388 


8.59e-2 


556 


3490 


15 


287 



Table 2: Numerical results for Test 2: a rank-deficient problem with exact data. 

RFSS RSSN 

j3 ifA x * e x * ^iterations time(ms) ^iterations time(ms) 




2 -24 


70 


4.65e-7 


96 


98 






2 -20 


218 


2.61e-6 


218 


461 


5 


96 


2 -16 


218 


4.23e-5 


220 


449 


5 


93 


2-12 


360 


1.03e-3 


368 


1670 


6 


142 



For both Tests 1 and 2, we observe that RSSN typically takes fewer iterations than RFSS, but 
it works on bigger active sets during the iteration. Despite this apparent difference, the computing 
time for both algorithms is practically identical on these datasets in case of small (5 values. For 
larger /3s, the support of the minimizcr gets larger and hence, RFSS needs considerably more 
iterations and consequently more computing time, whereas for RSSN the number of iterations 
stays small because the active set can change more dramatically. 

Finally, we would like to note that for both tests, the variation of the numerical results with 
respect to different realizations of the random matrix K as well as its size is fairly small, and 
thus the algorithms are statistically robust. We omitted the results for these variants as they are 
similar to those presented herein. 

4.3 Test 3: Presence of noise 

Next we investigate the practically more relevant case of noisy data. To this end, we use again the 
settings of Tests 1 and 2 but adding 5% Gaussian noise to the exact data to get y s . The value 
of the regularization parameter a is set to a ~ 5 = \\y s — y^\\. In addition, we also keep track 
of the error exx* '■= \\y^ — Kx*\\/\\y^\\. To assess the statistical performance of the algorithms, 
we repeat the experiment 100 times and calculate the mean values. The numerical results for 
the well-conditioned operator are shown in Table [3] Analogue results can be obtained for the 
rank-deficient operator, see Table [4] 

We observe similar performance for the algorithms in terms of the number of iterations and 
computation time as the noise-free case. But the parameter (3 now plays a far less influential role. 
This is attributed to the larger residual \\Kx* — y s \\- The presence of noise in the data inevitably 
deteriorates the accuracy of the results, compare Table|3]with Table[T] In the well-conditioned case, 
the elastic-net algorithms give results with comparable sparsity within commensurate computing 
time as that of classical t 1 algorithms. However, incorporating the £ 2 term into the functional can 
improve the accuracy of the result, and more important, restore the stability of the algorithms, 
which is especially evident in case of rank-deficient operators, see Table [3] 
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Tabic 3: Numerical results for Test 3: a well conditioned problem with noisy data. 



RFSS 



RSSN 






#4,- 






^iterations 


timc(ms) 


^iterations 


timc(ms) 




2- 8 
2- 5 
2- 3 
2- 1 


67.74 
68.08 
70.43 
77.01 
96.28 


3.02 
2.98 
2.72 
2.16 
1.44 


0.29 
0.28 
0.26 
0.21 
0.15 


67.74 
68.08 
70.43 
77.01 
96.28 


71.01 
76.59 
79.25 
80.30 
104.76 


8.29 
7.36 
6.59 
5.92 
5.08 


75.03 
75.88 
68.22 
62.05 
62.83 


Table 4: Numerical results for Test 3: a rank-deficient problem with 


noisy data. 










RFSS 


RSSN 


/3 


#A X . 




&Kx' 


^iterations 


timc(ms) 


^iterations 


timc(ms) 




2- 8 
2" 5 
2-3 
2- 1 


86.14 
87.30 
91.66 
106.40 


1.80 
1.73 
1.55 
1.24 


0.24 
0.23 
0.21 
0.17 


86.14 
87.30 
91.66 
106.40 


87.76 
89.67 
97.57 
121.42 


5.93 
5.71 
5.40 
4.86 


61.81 
61.19 
61.71 
60.81 



4.4 Test 4: Convergence rates 



The next experiment studies the convergence rates with respect to the noise level S, see Theo- 
rems 



2.14 and 2.17 Here we utilize the blur problem from MATLAB regularization tools [T7] with 
the following parameters: image size 50 x 50, band 5, sigma 0.7. We calculate the minimizer x* 
of ^ a .p using RSSN with a = 8 and each (3 6 {0, a/4, a/2, a}. Figure [T] displays the noise levels 



5 (a;-axis) and the respective errors 



° a ,/3 ~ x U 2 



(y-axis) in a doubly logarithmic scale. In the 



figure, the line from bottom to top corresponds respectively to the results for (3 = 0, a/4, a/2 and 



The results shown in Figure [T] corroborate the estimate (|9| in Remark 2.18 For large (3 values 



and high noise levels we observe 



a,0 



C<5 ' 61 , which is in agreement with the square- 



root-likc estimate, while in case of lower noise levels we observe 



cs° 



!)!) 



the 



slope is close to unit, which corresponds to the improved convergence rate of O(S). 

Increasing the value of the standard deviation, i.e. sigma, of the blur function makes the 



log \\x 




-10 -8 



Figure 1: Log- log plot of the reconstruction error versus the noise level 5. The lines correspond 
from bottom to top to the error using a = 5 and = 0, a/4, a/2, a. 
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Tabic 5: Numerical results for Test 4: blur problem with various levels of noise in the data. 



log(d) 




e x * 






(3 = a 


(3 = a/2 (3 


= a/4 


(3 = 


-9 


0.31 


0.25 


0.20 




-7 


0.46 


0.41 


0.35 




-5 


0.73 


0.67 


0.63 


1.13 


-3 


1.00 


1.00 


1.00 


1.00 



problem more ill-posed. The numerical results for sigma — 10 are shown in Table[5j One can clearly 
see the regularizing effect of elastic-net compared to classical I 1 minimization: The algorithms for 
the latter do not converge for low noise levels, i.e. small a values and (3 = 0. Upon decreasing noise 
level one observe the growing influence of the ill-conditioning of the operator, which consequently 
leads to numerical troubles for the classical i 1 algorithms. 

Finally we add 5% noise into the blurred image. The reconstructed images for a = 4x 10~ 4 and 
different (3 values are shown in Figure [2] For this example none of the tested I 1 algorithms would 
converge. However, a path-following strategy can remedy the problem: decreasing the (3 value 
gradually, and using the elastic-net reconstruction for a larger (3 value as the initial guess for the 
RSSN iterations with a smaller (3 value. This is in accordance with Proposition |2.2| Numerically, 
by iterating this procedure we can then obtain an acceptable ^-reconstruction. The reconstruc- 
tions show also clearly the qualitative differences between elastic-net and ^-minimization: For the 
former, neighboring pixels tend to feature groupwise structure, whereas for the latter, neighboring 
pixels more or less behave independent of each other. 

5 Conclusion 

We analyzed the elastic-net regularization from an "inverse problem" point of view. Using classical 
and modern techniques we showed that elastic-net regularization combines the best of both t 2 - and 
.^-regularization, i.e. the good convergence rate of ^-regularization and modest constants in the 
error estimates from ^-regularization. Moreover, we also showed that the a posteriori parameter 
choice due to Morozov also works for elastic-net regularization and leads to the same convergence 
rates as our a priori choice. Large parts of our analysis were based on a linear coupling of the two 
regularization parameters. However, Theorem |2.7| indicates that an asymptotic linear coupling 
of the parameters would suffice. From Example |2.8| one may conjecture that there is a critical 
value of the coupling constant 77 for all values greater than which the minimal-?^ • ||^i + ||| • \\ 2 2 - 
solution coincides with the minimal- 1 • l^i -solution. This would provide a further justification for 
the elastic-net functional. 

We have also developed two active set methods for minimizing the elastic-net functional and 
numerically confirmed their excellent performance. We may state that elastic-net is coequal to 
classical i 1 minimization in terms of relative error, sparsity and computation time for well condi- 
tioned problems and is favorably for ill-conditioned problems. 
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Figure 2: A true image and its blurry and noisy measurement together with elastic-net recon- 
structions for a = 1 x 10 -4 and different values of (3 and the t 1 -reconstruction. 
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